-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support subscript and nested functions in grouping queries #5998
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks mostly good to me - thanks @vpapavas! A few inline comments and more test requests 😂
ksqldb-engine/src/main/java/io/confluent/ksql/analyzer/AggregateAnalyzer.java
Outdated
Show resolved
Hide resolved
ksqldb-engine/src/main/java/io/confluent/ksql/analyzer/AggregateAnalyzer.java
Outdated
Show resolved
Hide resolved
ksqldb-engine/src/main/java/io/confluent/ksql/analyzer/AggregateAnalyzer.java
Outdated
Show resolved
Hide resolved
ksqldb-engine/src/main/java/io/confluent/ksql/analyzer/AggregateAnalyzer.java
Outdated
Show resolved
Hide resolved
@@ -359,5 +401,14 @@ public Void visitQualifiedColumnReference( | |||
) { | |||
throw new UnsupportedOperationException("Should of been converted to unqualified"); | |||
} | |||
|
|||
@Override | |||
public Void visitSubscriptExpression(final SubscriptExpression node, final Void context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any other expressions that we should be looking that can match exactly other than function calls and subscript expressions? i.e. what about arithmetic expressions (SELECT AS_VALUE(a + b) AS foo GROUP BY a + b
)
I think it makes sense to implement this in visitExpression
instead of in each individual expression type. We would only "override" this behavior when we need to (e.g. inside visitFuncitonCall
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to check for arithmetic expressions.
Yes, this is a good point in the sense that we need to figure out which expressions need special handling. However, there is no visitExpression
in TraversalExpressionVisitor
unless I misunderstand what you meant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agavra The approach of putting the "common" code in process
doesn't work. Reason being is that if the parent expression exists in the group by e.g. (a+b)
, it will set the flag foundExpressionInGroupBy
to true and will continue visiting the children, a
and b
. If a child doesn't exist in the group by, it will set the flag to false. All the rest of the children then (b
in this example), will fail and will wrongly get added to the nonAggParams
.
TLDR: the flag must be set to false only after the entire tree of the expression has been visited. And that is only possible to do in the specific method for each expression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look at my newest comment, I think we have that problem anyway. I still think we should be able to do this in the common process if you take the suggestion I have there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ksqldb-functional-tests/src/test/resources/query-validation-tests/group-by.json
Show resolved
Hide resolved
ksqldb-engine/src/main/java/io/confluent/ksql/analyzer/AggregateAnalyzer.java
Outdated
Show resolved
Hide resolved
ksqldb-functional-tests/src/test/resources/query-validation-tests/group-by.json
Show resolved
Hide resolved
Also don't forget to check in the historical plans:
|
Look at |
ksqldb-engine/src/main/java/io/confluent/ksql/analyzer/AggregateAnalyzer.java
Outdated
Show resolved
Hide resolved
ksqldb-engine/src/main/java/io/confluent/ksql/analyzer/AggregateAnalyzer.java
Show resolved
Hide resolved
ksqldb-engine/src/main/java/io/confluent/ksql/analyzer/AggregateAnalyzer.java
Outdated
Show resolved
Hide resolved
Hey @vpapavas, There's another potential issue around this you may have fixed, or which is worth looking at at the same time. The thread is in community slack: https://confluentcommunity.slack.com/archives/C6UJNMY67/p1594325261431500 However, the short version is that this doesn't currently work: CREATE TABLE my_agg_z_t AS
SELECT
id as id_key,
number as number_key,
CAST(id as STRING)+'|+|'+CAST(number as STRING) AS crk,
AS_VALUE(CAST(id as STRING)) as id,
AS_VALUE(CAST(number as STRING)) as number,
AVG(value) AS someavg FROM blah_s
WINDOW TUMBLING (size 1 hour)
GROUP BY CAST(id as STRING)+'|+|'+CAST(number as STRING)
EMIT CHANGES; Resulting in the error:
I think this PR fixes some of this, but maybe not all. It would be good to add a test case that copies the key columns, i.e. the grouping expressions, into the value using |
Hey @big-andy-coates , the query you posted shouldn't work, i.e., it is correct that it fails. One can only use grouping columns in non-aggregate functions. Namely, the grouping column in this example is For the query to work as is one workaround is for the grouping columns to contain additionally
and this query with my changes works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost lgtm, for what it's worth I think your code is much easier to follow than what it used to be :D
ksqldb-engine/src/main/java/io/confluent/ksql/analyzer/AggregateAnalyzer.java
Outdated
Show resolved
Hide resolved
ksqldb-engine/src/main/java/io/confluent/ksql/analyzer/AggregateAnalyzer.java
Outdated
Show resolved
Hide resolved
ksqldb-engine/src/main/java/io/confluent/ksql/analyzer/AggregateAnalyzer.java
Outdated
Show resolved
Hide resolved
@@ -359,5 +401,14 @@ public Void visitQualifiedColumnReference( | |||
) { | |||
throw new UnsupportedOperationException("Should of been converted to unqualified"); | |||
} | |||
|
|||
@Override | |||
public Void visitSubscriptExpression(final SubscriptExpression node, final Void context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look at my newest comment, I think we have that problem anyway. I still think we should be able to do this in the common process if you take the suggestion I have there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Minor comments leftover
* then the UnqualifiedColumnReference is added to the schema. | ||
* | ||
* For validation, the visitor checks that: | ||
* 1) expressions in non-aggregate functions are part of the grouping clause, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* 1) expressions in non-aggregate functions are part of the grouping clause, | |
* 1) expressions not in aggregate functions are part of the grouping clause, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually correct what I wrote :P It will throw if expressions used in aggregate functions are not part of the group by
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will throw if expressions used in aggregate functions are not part of the group by
now I'm more confused 😂 expressions in aggregate functions must not be part of the group by - SELECT id, SUM(foo) FROM s GROUP BY id
works (foo is an aggregate function parameter not in group by)
What you had originally written is true, but does not cover everything. It says expressions in UDFs (not UDAFs) must be part of the grouping clause (true). But it is also true that even if they're not in a UDF, as long as they're not in an aggregate function, they must be in the grouping clause:
--- valid
SELECT non_agg_fun(foo), COUNT(*) from s GROUP BY foo; -- (expression in non-aggregate function)
SELECT foo, COUNT(*) from s GROUP BY foo; -- (expression not in aggregate function)
-- invalid
SELECT foo, COUN(T*) from s GROUP BY bar; -- (expression not in aggregate function, and not in grouping)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, I meant expressions in non-aggregate functions must be part of group-by. So much confusion :) So, the comments say that the validator will throw if any of the conditions does not hold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah wait, I think I now get what you are suggesting. You are suggesting to make the sentence more general. OK, will make the change
ksqldb-engine/src/main/java/io/confluent/ksql/analyzer/AggregateAnalyzer.java
Outdated
Show resolved
Hide resolved
ksqldb-engine/src/main/java/io/confluent/ksql/analyzer/AggregateAnalyzer.java
Outdated
Show resolved
Hide resolved
ksqldb-functional-tests/src/test/resources/query-validation-tests/group-by.json
Show resolved
Hide resolved
…fluentinc#5998) * fixed handling subscript and nested functions * remove intermediate topics from test file * addressed comments, handle struct and arithmetic, added plans * fixed window bounds error * adding historic plans * addressed Almog's comments * minor fixes
…fluentinc#5998) * fixed handling subscript and nested functions * remove intermediate topics from test file * addressed comments, handle struct and arithmetic, added plans * fixed window bounds error * adding historic plans * addressed Almog's comments * minor fixes
Description
Fixes #5906 and #5967
Testing done
Added test cases in
group-by.json
Reviewer checklist